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Abstract 

Rare events can potentially occur in many applications. When manifested as opportunities to be 
exploited, risks to be ameliorated, or certain features to be extracted, such events become of paramount 
significance. Due to their sporadic nature, the information-bearing signals associated with rare events 
often lie in a large set of irrelevant signals and are not easily accessible. This paper provides a statistical 
framework for detecting such events so that an optimal balance between detection reliabihty and agiUty, as 
two opposing performance measures, is established. The core component of this framework is a sampling 
procedure that adaptively and quickly focuses the information-gathering resources on the segments of 
the dataset that bear the information pertinent to the rare events. Particular focus is placed on Gaussian 
signals with the aim of detecting signals with rare mean and variance values. 

1 Introduction 

The problem of searching for scarce and at the same time significant events with certain statistical behavior 
in observed information streams is a classical one and has attracted attention in a wide variety of fields over 
the past few decades. Such events can broadly model at least three categories of problems. One concentrates 
on seeking opportunities in a range of domains including trading in finance ^ [2 |3] and spectrum sensing 
in telecommunication [4j |5] . The second category pertains to minimizing risks and avoiding catastrophes in 
applications such as risk analysis in econometrics [HI El |H1 [SI HOI E] > blackout cascade avoidance in energy 
systems [T^ [T3] , intrusion identification in network security [HI [T^ [13 HTl [13 [H] , fraud detection [5DJ [H] , 
and seismology. And the third category deals with extracting certain data segments with pre-specificd 
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probabilistic features with applications in image and video analysis, medical diagnosis, neuroscience, and 
remote sensing (seismic, sonar, radar, biomedical) [22l [231 1211 

While detecting the rare events in some of these applications is time-insensitive, in some other applications 
time is of the essence and it is important to devise timely and reliable decision- making mechanisms. Such 
time-sensitivity can be due to the transient nature of the opportunities that are attractive only when detected 
quickly, or due to the substantial costs that risks can incur if not detected and managed swiftly, or for allowing 
for real-time processing of the information. 

All these applications can be modeled in terms of collection of events that can be broadly categorized 
into two groups. One group constitutes the majority of the events, which are deemed normal and occur most 
of the time, and the other group consists of the events that occur rarely but are of extreme significance to 
the observer. In this paper, without loss of generality, we assume that the group of all normal events share 
the same statistical behavior and the group of rare events also share identical statistical behavior, albeit 
different from that of the normal events. This dichotomous model is mainly to focus the attention on the 
discrepancy between the rare and the normal events and can be easily generalized to models that involve 
multiple statistical behaviors for each group. Hence, we assume that the observer has access to a collection 
of n sequences of random observations X^, . . . , X"^, each modeling one event. Each sequence A"* consists of 
independent and identically distributed (i.i.d.) measurements A"' = taking values in the set 

= M endowed with a cr-field J- of events, obeying one of the two hypotheses. 

Ho: X^^Fo, J = l,2,... 

(1) 

Hi: X^^Fi, j-1,2,... 

where Fq and Fi denote the cumulative distribution functions (cdfs) of two distinct distributions on (57, 
The distribution Fq models the statistical behavior of the normal events and the distribution Fi models the 
statistical behavior of the rare events. For convenience, we assume that Fq and Fi have probability density 
functions (pdfs) /o and /i, respectively. Each sequence is generated by Fq or Fi independently of the 
rest of the sequences and we assume that hypothesis Hi (a rare event) occurs with prior probability e„. In 
order to incorporate the rareness of the sequences generated by Fi, we assume that e„ — o(l). 

The goal of quick search is to identify one or more rare events among all n given events through 1) 
designing an information-gathering process for collecting information from the sequences X^,. .. , A"", and 
2) delineating optimal decision rules. Designing a quick search process involves a tension between two 
performance measures, one being the aggregate amount of information accumulated (i.e., the number of 
observations made) and the other being the reliability (or cost) of the decision. In this paper we design an 
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optimal information-gathering process that maximizes the decision reHabihty subject to a /lartU constraint 
on the aggregate number of observations we are allowed to make from X^, . . . , A"". 

1.1 Related Literature 

The quick search problem is closely related to the sequential detection literature, with two major differences. 
First, the existing approaches in sequential detection that can be applied to the quick search problem at hand 
often optimize a balance between decision reliability and a sof^ constraint on the number of observations. 
This is in contrast with the quick search setting that enforces a hard constraint on the sensing resources. 
Secondly, sequential detection often aims to identify all rare events or only one rare event, whereas our setting 
offers the flexibility to identify one or more rare events. The most relevant sequential detection solutions for 
identifying all and only one rare event (generated by Fq) are the sequential probability ratio test (SPRT) [21] 
and the cumulative sum (CUSUM) test respectively. More specifically, by denoting the true hypothesis 
and a decision about sequence by e {Hq, Hi} and e {Hq, Hi}, respectively, the Type-I and Type-II 
detection error probabilities corresponding to sequence X^ are 

P] = P(D, = Hi I T, - Ho) and P,^ = P(D, = Hq | T, = Hi) . (2) 

When the objective is to identify all rare events (generated by Fi), minimizing the average number of 
observations made with constraints on P^^ and P2 can be decomposed into minimizing the average number of 
observations necessary for deciding between Hq and Hi for each individual sequence with the same reliability 
constraints [27] . The optimal test for each, which is the test that requires the smallest number of observations 
and satisfies the reliability constraints, is the SPRT [5S]. On the other hand, when the objective is to identify 
only one rare event (generated by Fi) there exist, broadly, two classes of solutions. In one class n is finite 
and it is known a priori that there exists only one rare event (this is called scanning problem) . The optimal 
detector that identifies the rare event for the Brownian motion setting is given in 28J and 29 . In the other 
class rt = 00 so that it is ensured that almost surely there exists a rare event. For this class the optimal 
test, which is the test that minimizes the average delay subject to a constraint on Type-II error probability, 
is the CUSUM test 

We also remark briefly on the connection between the quick search problem and the problem of sparse sup- 
port recovery in noisy environments. These two problems become equivalent (in settings and not objectives) 
when the events space is mapped to the sparse signal space, the rare events are mapped to the constituents 
of the support of the sparse signal, and the stochastic observations of the normal and rare events follow the 

^By a hard constraint we mean that the aggregate number of observations made cannot exceed a specified level. 
^By soft constraint we mean that the expected number of the observations made cannot exceed a specified level. 
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distributions of noisy observations of the normal and sparse components of the sparse signal. A recent line of 
research in sparse recovery relevant to the quick search problem is the notion of adaptive sampling (sensing) 
in high-dimensional noisy data with sparsity structures, in which the objective is to estimate the support of 
the signal [301 EI] ■ The works in [30] , [3T] , and [35] propose adaptive sampling procedures that effectively 
focus the sensing resources on the segments of the data that have higher likelihood of containing the support. 
Despite the similarities in the settings, however, there is one major discrepancy in the objectives of sparse 
recovery and quick search, and that is the equivalent of the goal of quick search in the sparse recovery setting 
becomes identifying one subset of the support of desired length, whereas the goal of sparse recovery is to 
identify the entire support of the signal. It is noteworthy that such notion of fractional support recovery is 
also studied in the context of compressive sensing through non-adaptive data acquisition 33J. The research 
in |33j considers a sparse signal with known support size (unlike our setting that the number of rare events is 
stochastic) and takes one set of low-dimensional observation and analyzes the interplay among the sampling 
rate, the fraction of the support to be recovered, and the necessary and sufficient conditions on signal power 
for recovering the support reliably. 

1.2 Contributions 

The objective of quick search in this paper is to identify a fraction of the rare events (as opposed to the 
existing literature which targets at identifying either all or only one rare events). This less-investigated 
objective aims to fill the gap between the two well-studied extreme (all or one) objectives. This shift of 
objective allows the detector to tolerate a higher level of Type-I errors, i.e., the detector can afford to miss 
some of the rare events, in favor of quickly detecting the fraction of interest of the rare events (i.e. sequences 
generated by i^i ) . In general the number of rare events is a random variable between and n and is unknown 
a priori. 

Besides the objective, another major distinction from the existing literature is the constraint enforced 
on the sampling resources. This paper considers the less-investigated scenario in which there exists a hard 
constraint on the sampling resources. This is in contrast with the existing literature which often incorporates 
the statistical average of the sampling budget in the analysis. We remark that to the best of our knowledge, 
the general problem of designing the optimal sampling strategy, which distributes a limited sampling budget 
among the sequences under observation in order to yield the highest level of reliability in detectng one, a 
fraction, or all of the sequences generated by -Fi is an open problem. In this paper we focus on the asymptotic 
performance when the number of sequences n is sufficiently large and provide the asymptotically optimal 
sampling procedure when the distributions and Fi are Gaussian with either different mean or different 
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variance values. 

To solve the quick search problem with the aforementioned objective and constraint, we design a sequential 
and data-adaptive information-gathering procedure and detection rule. 



1.3 Summary of Results 

An adaptive and sequential sampling process is designed that dynamically makes one of the following three 
data-driven decisions at each time: 1) it lacks sufficient evidence to identify the rare events and decides 
to postpone the decision and collects more data (observation), or 2) reaches some confidence to eliminate 
roughly a G (0, 1) fraction of the events that are deemed to be the weakest candidates for being rare events, 
but yet does not have enough evidence to make the final decision (refinement), or 3) it has accumulated 
enough information to identify the rare events of interest (detection). As one major result of this paper we 
characterize the the asymptotically optimal (in the asymptote of large number of events n) allocation of the 
sampling resources among the sequences and the pertinent sampling procedure, which consists of consecutive 
rounds of coarse observations and refinement actions, followed by consecutive cycles of fine observations. The 
number of each these cycles (refinement and observation) is also determined as a function of the number of 
available sampling budget. 

It is confirmed in different contexts that in order to recover the rare events (or the support of the sparse 
signal) reliably, the power of the observed data should be scaling with the data size n (cf. [34] , [33] , [30] , [31] , 
and [32]). We apply the proposed adaptive sampling procedure for setting that the normal and rare events 
are generated according to Gaussian distributions with either different mean or different variance values 
and characterize necessary and conditions on scaling rates of the mean and variance values for identifying 
the desired number of rare events. These scaling rates of the means and variances are functions of the 
data dimension n, the available amount of sampling budget, and the frequency of the rare events. More 
specifically, if we denote the likelihood of an event being a rare event by e„ and define e„ G (0, 1) as 

A lnne„ 



In n 



for successfully identifying a small fraction of the rare events when the normal events are distributed as 
J^{Hq, 1) and the rare events as A/'(/ii, 1) via the adaptive procedure a necessary and sufficient condition is 
that in the asymptote of large n 

log n c 

where c is a constant determined primarily by the available sampling budget and the number of refinement 
cycles. We also show that when the normal and rare events are distributed as Af{0,Ao) and Af{0,Ai), 
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respectively, the counterpart necessary and sufficient condition is 



InAoMi 



> 



2(1 -e„) 



(5) 



Inn 



c 



Finally, in order to assess the gains of the data-adaptive sampling process in comparison with a non-adaptive 
procedure we assess the scaling laws of the mean and variance values. It is shown when the adaptive and 
non-adaptive procedures enjoy the same sampling budget, for the mean and variance values we identify 
the same behavior is in Q and ([s]), respectively, with the exception that the constant c is decreased to 
c • , where a £ (0, 1) was defined earlier and K is related to the number of refinement cycles. In another 
interpretation of the results, when the mean and variance have identical scaling behavior in both adaptive 
and non-adaptive sampling settings, for achieving identical level of reliability in detecting the rare events, 
the adaptive procedures requires roughly on the order of less sampling resources. 

2 Problem Statement 
2.1 Sampling Model 

With the ultimate objective of identifying r„ = o(ne„) (i.e., a small fraction of the rare events) the proposed 
sampling procedure is initiated by making observations from all sequences X^, . . . , X"'. Based on these rough 
observations a fraction of the sequences that are least-likely generated by Fq are discarded and the rest are 
retained for further and more accurate scrutiny. Repeating this procedure successively refines the search 
support and progressively focuses the observations on the more promising sequences. More specifically, at 
each time the sampling procedure selects a subset of the sequences X^, . . . , A"" and takes one sample from 
each of these sequences. Upon collecting these samples, it takes one of the following actions: 

Ai (Detection): stops further sampling and identifies r„ sequences that have the highest likelihood of 
being rare events (generated by Fi); 

A2 (Observation): continues to further observe the same set of sequences in order to gather more 
information about their statistical behavior; or 

A3 (Refinement): discards a portion of the sequences and declares that they are most likely normal 
events (generated by Fq). When a sequence is discarded it will be deemed a weak candidate for being 
a rare event (generated by Fi) and will not be observed anymore, while the remaining sequences are 
retained for more scrutiny. By denoting the number of sequences retained prior to a refinement action 
by £, the number of sequences that this action discards is {1 — a){£ — Tn) for some a £ (0, 1). Discarding 
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the scqiicnccs at this rate ensures that at least T„ sequences will be retained for the final detection 
action (action Ai). 

We denote the set of the indices of the sequences observed at time f e N by £4. As we initialize the 
information-gathering procedure by including all sequences for observation we have £1 = {1, . . . ,n}. Also, 
we denote the stopping time of the procedure, i.e., the time after which detection (action Ai) is performed, by 
r. Furthermore, we define the switching function ip : {1, . . . ,t} ^ {0, 1} to model actions A2 (observation) 
and A3 (refinement). At each time 1 < f < r — 1 we set '^(f) = if we decide in favor of performing 
observation, while tjj{t) = 1 indicates a decision in favor of performing refinement, i.e., 



Vi e {!,..., T-1}: ^(i) = 



action A2 and £t+i = £* 

1 action A3 and £t+i C £t 



(6) 



Let XI denote the observations made from sequence i e £t at time t and denote the cr-algebra generated by 
the observation {XI, . . . ,Xl} by 

Vie A: Tl^a{Xl,...,Xl) . (7) 

Given T^, we denote the posterior probability that the sequence A**, for i e £*, is generated by Fi by 
TTj = P(Tj = Hi I J^l). Invoking the independence among the observations {XI,. .. ,Xl} provides 

n -1 



en i\MXl)\ 



By defining the likelihood ratio 



we have 



A^ 



n 



1 + — Aj 



1 -1 



(8) 



(9) 



(10) 



and the actions Ai, A2, and A3 can be formalized as follows. 
Ai: At the stopping time r identify the set U C as the detector's decision according to 



U = arg max ¥NieU : Tj = Hi I {J"* : ie £^}) 

UGC^: \U\=T„ ^ ' 

= arg max TT P (T^ = Hi | jr;) 

i&A 

= arg max TT 

= the indices of the T„ smallest elements of the set {A^ : i e £r} 



(11) 
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A2: At time t the decision is to further measure the same sot of sequences (i.e. Ct+i = Lt), and set 

A3: At time t the decision is to refine Ct and the set Ct+i is obtained as 

Ct+i = arg max P (Vi G £ : T, = Hi | {Tl : i € £«}) 

= the indices of the a smallest elements of the set {AJ : i e £4} , (13) 

where by recalling the description of the refinement action we can write 

a = [a{\Ct\-Tn)\ + T„ . (14) 



2.2 Optimal Sampling 

Characterizing the experimental design and decision rules for performing quick search involves an interplay 
between two performance measures, one being the aggregate number of observations made and the other being 
the frequency of erroneous detection. Any improvement in either of these performance measure penalizes 
the other one and the optimal design of such procedures typically involves optimizing a tradeoff between 
them. Furthermore, once the optimal aggregate number of observations is determined, we also need to 
ascertain how these observations should be split and allotted to the refinement and detection actions. For 
a given stopping time r and a given sequence of switching functions '>P{t) = {ip{l), V'(2), • • • , ipi'^ ~ !)}> the 
probability of erroneous detection, that is the probability that the detected sequences includes a normal 
event (generated by Fi), is 

P„(r,^(r)) = v(\{iGU: Ti = Ho}|y^o) . (15) 

Our goal is to minimize this above detection error probability over all possible stopping times r, all switching 
rules i/i, and all possible allocations of the observations to refinement and detection actions subject to two 
hard constraints. One constraint incorporates the impact of the aggregate number of observations made, and 
the other one captures the cost of the refinement actions. Specifically, for the first constraint the aggregate 
number of observations normalized by the number of sequences is constrained to be less than a pre-specified 
value S. As the second constraint, the number of refinement actions must also be smaller than a pre-specified 
value K. This latter constraint captures the cost incurred by each refinement action, which is the permanent 
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loss of the sequences discarded after the refinement actions. This optimization problem can be formalized as 

Vn{S,K)^{ s.t. ^Er=i'lA|<^ • (16) 

We solve this problem in the asymptote of large n for two hypothesis testing problems that test the mean and 
the variance of Gaussian observations, i.e., the problems of form ([T]) with the hypothesis H,„, for m g {0, 1}, 
are: 

1) Gaussian mean: for m E {0, 1} : H„i : X] ^ JVifim, 1) • 

(17) 

2) Gaussian variance: for m G {0, 1} : H„i : Xj ^ Af{0,Am) ■ 
In this paper for the asymptotically large values of n we characterize: 

1. the optimal stopping time and sampling process (through designing the optimal switching sequence) 
that maximize the detection reliability subject to the hard constraints on the sampling resources and 
refinement actions; 

2. the asymptotic detection error Vn{S, K) for any given S and K; 

3. the minimum distance between the unknown pair of parameters in each test such that the two hy- 
potheses are guaranteed to be distinguished perfectly, i.e., Vn{S,K) "^°°> 0; 

4. efficiencies of the proposed sequential and adaptive sampling procedure with respect to that of the 
non-adaptive procedure that does not involve any refinement action and performs detection directly, 
i.e., K = 0; and 

5. comparisons with a few relevant tests. 

Note that among the three actions, action A2 (observation) concentrates on accumulating further evidence 
about the statistical behavior of the sequences under observation. Action A3 (refinement) is intended to 
purify the set of the sequences on which we ultimately perform the detection action. Each iteration of 
the refinement actions monitors the events retained by the previous iterations and eliminates those deemed 
least-likely to be rare events. The core idea of the refinement action is that it is relatively easy to identify 
sequences drawn from Fi with low-quality measurements as the events of interest occur rarely (e„ is small) . 
After iteratively performing actions A2 and A3, we will have a more condensed proportion of the desired 
sequences to the non-desired ones. The sequences retained after the last action A3 are further observed and 
finally fed to the detector in order to identify T„ sequences of interest. 
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Characterizing the detection error rate P(t, i/'(t)), as a result, depends on 1) the evohition of the pro- 
portion of the number of the normal and rare events throughout the refinement actions, and 2) the quality 
of the detector. For any time t G {1, . . . , r} let us partition the set £f, which includes the indices of the 
sequences observed at time t, into two disjoint sets and Lj as 

£r = {« e A : Ti = H„}, for m e {0, 1} . 

Also let us define rit = and fit = |£t | for all t G {1,. .. ,t}. As will be made clear in the subsequent 
sections, implementing action Ai (detection) necessitates obtaining the low-order statistics of the sets {AJ : 
i G and {A[ : i G with sample sizes and n^, respectively. Evaluating the dynamics of the 
refinement action (A3) additionally requires analyzing the high-order statistics of the sets {Aj ; i G C^} and 
the low-order statistics of the sets {A^ : i G C]} for all t G {1, . . . ,t}. Moreover, rit and fit are also random 
variables that change (reduce) after each refinement action. Analyzing the evolution of the set sizes rit and 
fit as well as the necessary order statistics depends on the distributions Fq and Fi. 



3 Preliminaries 

In this section we provide some definitions, notation, and basic results on asymptotic statistical behavior 
of order statistics that we frequently use throughout the rest of the paper. Given a sample of m random 
variables Fi, . . . , Y^, we denote the corresponding sequence of order statistics by Yi^m, ■ ■ ■ , Ym-.m, where Yr-.m 
is the r*'* order statistic. When the Yi's are independent and drawn from the same parent distribution with 
cdf G, the cdf of Yj-.m, denoted by Gr-.m, is given by 

When m tends to infinity the limit distributions Gi;m and Gm-.m become degenerate as 

lim Gi:m{y) = l{G(3/)>0} and lim Gm:m{y) = l{G(j/)<l} • 

A common approach to avoid degeneracy is to identify two appropriate sets of afBne transformations of 
Yi , . . . , Y^ 

Wi = am + bmYi , for i €{!,..., m}, (18) 
and Wi = Cm + dmYi , for ie{l,...,m}, (19) 

that have non-generate limit distributions. Specifically, the cdfs of the order statistics Wi;m and Wi-m, 
denoted by and Oi:m,(s '^i), respectively, satisfy 

lim Qi:m{w;m) = L{w) , \/w €R , (20) 
10 



and 



lini Qi:m{w;m) ~ H{w) , Vw e 



(21) 



for some non-degenerate cdfs L and H. The precise characterizations of L and H depend on the trans- 
formation coefficients a-m, c™, and dm- The random variables {W^}™ and {M^i}f4i7 in general, have 
completely different statistical behavior in low, central, and high order statistics. In the analysis in this 
paper we need the distributions of all kind of orders, which are formally defined next. 

Definition 1 The order statistic Yr;m is said to be of low order and the order statistic Ym-r+i-.m is said to 
be of high order iff Ihnm^oo — — 0- 



Definition 2 The order statistic Y^-.m is said to be of central order iff3(^€z (0, 1) such that 

^/^(--C) = . 



lim 

m— voo 



Definition 3 (Domain of attraction) A given parent cdf G for {Yi}"^^ is said to belong to the minimal 
domain of attraction of a cdf L if there exists at least one pair of sequences {a.m\m=i ^'^'^ {^m}m=i such that 



the sequence of random variables {W^i}™i defined in (18) satisfies (20 1. Similarly, we say that G belongs to 
the maximal domain of attraction of a cdf H if for at least one pair of sequences {cm\m=i '^'^^ {dm\m=i 



sequence of random variables {WiY^i defined in (19) satisfies (21) 



Finding the possible cdfs L and H corresponding to any given parent distribution F has a rich literature; cf. 
[35j or |36j . The well-known result is that only one parametric family is possible for L and only one family 
for H: 

Theorem 1 (von-Mises family of distribution) The only non- degenerate family of distributions satis- 



fying ( 20 ) is of the form 

L{w) = 1 — exp ■ 



1 + K 



w — X 



i/« 



1 + .('^^1>0, 



and the only non- degenerate family of distributions satisfying (21 ) is of the form 

1/k 



H{w) — exp ■ 



1 - K 



w — X 



(22) 



(23) 



Proof: See [Ml [36]. ■ 

Besides the limit distributions of minima and maxima, in this paper we also need to find the asymptotic 
non-degenerate distributions corresponding to low and central order statistics. The following theorems 
characterize the asymptotic non-degenerate distributions for these order statistics as functions of the limit 
distributions of minima. 
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Theorem 2 (Asymptotic distributions of low-order statistics) Let G be a continuous cdf that be- 
longs to the minimal domain of attraction of a cdf L with the associated pair of sequences {am}m=i '^'^'^ 
{bm}m=i- Then for r = o(m) we have 

lim Qr:m{w; m) = 1 - l{L(„)<i} • [1 - L{w)] V ['Ml-^W)]^ ^ Vu. G M . (24) 

i=0 

Proof: See [371 Theorem 8.4.1]. ■ 

Theorem 3 (Asymptotic distributions of central order statistics) Let a continuous cdf G with as- 
sociated pdf g be the distribution of m i.i.d. random variables. Let C, be a real number in the interval (0, 1) 
such that g{G~^{C)) 7^ 0. Then central order statistic r = [mCl distributed as 

\ m[5(G-i(C))]V 
Proof: See [371 Theorem 8.5.1]. ■ 
Finally, we define asymptotic equivalence and equality as follows. 

Definition 4 (Asymptotic equivalence) Two sequences {om} and {bm} are said to be asymptotically 
equivalent, denoted by a„i = b^, when lim„j_5.oo |^ = 1. We also say that {a,,,,} and {6^} o'^e asymptotically 
equal, denoted by am = bm, when lim„_j.oo(an ~ ^n) =0. 

Definition 5 (Asymptotic dominance) A Sequence {am} is said to be asymptotically dominated by {bm}, 
denoted by a,„ — o{bm), when lim,„^oo = 0. Also, {om} is said to be asymptotically dominating {6m}, 
denoted by Om — ^{bm), when bm — o{am) 



4 Optimal Sampling 

In this section we determine the optimal stopping time r and the associated optimal switching sequence 
■0(r) — {ip{l), ?/'(r — 1)}. For this purpose, we initially characterize the error probability P(t, iP{t)) 
defined in ( |15[ ) for any given stopping time t and switching sequence V'(t) and then optimize it over the 
valid choices of r and iP{t) such that the constraints on the aggregate sampling budget and the number of 
refinement actions are satisfied. As stated earlier, characterizing the detection performance P(t, i/'(t)) for a 
given stopping time r and a switching sequence '0(t) involves analyzing 1) the evolution of the proportion 
of the number of sequences generated by _Fo £^nd Fi throughout the refinement actions, and 2) the quality 
of the detection action. Analyzing both aspects heavily depend on the underlying distributions Fq and Fi 



and are treated separately for two the problems given in (17 1 
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4.1 Gaussian Mean 

The hypothesis-testing problem in this case is 



Ho: x]^^^{^lo^), j = i,2,... 

(25) 

Hi: X]^AA(^i,l), J = 1,2,... 



where > fii and /io, Mi ^ If^- Under this setting, the hkehhood ratio at time t for the sequences z G £t is 

(26) 



W - Mo 



By defining 



Vte {1,...,t} and Vze A : Z,^ = ^ , (27) 



the detection and refinement actions formahzed in (11) and (13), respectively, are equivalently given by 



U — the indices of the r„ smallest elements of the set {Z* : i e Cr} , and (28) 
Ct+i = the indices of the [a|£t|J smallest elements of the set {Zl : i e £(} . (29) 



On the other hand, by invoking the distribution of XI from (25) we immediately have 

V< e {1,...,t} and Vie A : | H„. ^ AA(m™ • i , . (30) 

Furthermore, let us for each t e {1, . . . , t} define 

Vj €{!,..., nt} : U!j = the j*'' smallest element of {Z^ : i € £?} , (31) 
and Vj e {1, . . . , nt} : f/j = the j*'' smallest element of {Z^^ : i e £^1} . (32) 

Given these definitions, the probability P(r, V'(t)), which is the probability that at least one member of U is 
distributed according to Fi, can be equivalently written as 

P„(r, V5(t)) ^P{\Un£°\ >0)=P (U^>U^) . (33) 



Assessing this performance measure involves finding the cardinalities and the distributions of the first and 
the T^'* order statistics of the sets of random variables {ZJ^ : i £ C^} and {Z^. : i £ C}}, respectively. 
In order to proceed with analyzing P„(t, ■(/'(r)), we need the distributions of these order statistics. As the 
first step, the following lemma determines to what minimal domain of attraction the standard Gaussian 
distribution belongs. Also, this lemma in conjunction with Theorem |3] determines the distribution of the 
T*'' order statistic. 
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Lemma 1 (Gaussian Minimum) The standard Gaussian distribution belongs to the minimal domain of 
attraction of 

L{w) = 1 - exp (- exp(w - In 2^7?)) , Vw G M , 



corresponding to (22) for k — > 0. The associated affine transformation is characterized by the sequences 



{flm} and {bm} given b y 



where /i : {x G M : a; > 1} 



h(m) , and b^ = \/h(rn) . Vm G {2, 3, . . . } 



is defined as 



h{x) = 2 In a; — In In X 



(34) 



Proof: See Appendix |Xj ■ 
In the first step we assess the variations of nt, fit, and their associated ratio throughout the refinement 
actions. The following lemma sheds light on the tendency of the refinement action (A2) towards retaining 
almost all the rare events and discarding almost a proportion of the normal events. 

Lemma 2 (Refinement Performance) Let fit — \Ct\ and nt — \Cl\ denote the number of sequences 
generated by Fq and Fi that are retained up to time t. For any arbitrary S G (0, 1) and for sufficiently large 
n, the event 

rir > {l-S)ni (35) 



holds almost surely if 



where En is defined as 



1^0 ~ A*i = ^ 



{"-■'1 



In 5 



Inri 



(36) 



(37) 



Proof: See Appendix [B] ■ 
Therefore, when the condition in (361 is satisfied, the refinement actions almost surely discard no more than 



a fraction S of the rare events, for any arbitrary S G (0, 1). Therefore, the final ratio of the number of rare 
events to that of the normal events increases dramatically throughout the refinement actions. Besides the 
performance of the refinement actions, the overall detection reliability also depends on the performance of 
the detection action (Ai). The next lemma describes a necessary and sufficient condition that guarantees 
asymptotically error-free detection. 
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Lemma 3 (Detection Performance) For a given stopping time t and switching sequence ipir), the de- 
tection error probability P„(r, ijj{T)) tends to zero in the asymptote of large n if and only if 

n„ > iW!^ , (38) 



where we have defined 



^ . (39) 

2 Inn 



Proof: See Appendix [Cj ■ 
Therefore, this lemma, conditionally on the value of the stopping time which is stochastic, provides a 
necessary and sufficient condition on the distance between distributions Fq and Fi (captured by (/io — /^i)) 
for achieving asymptotically optimal detection performance. In the next lemma we show that the stochastic 
stopping time is upper bounded by a constant. 

Lemma 4 The stopping time t is upper bounded by S/a^ for sufficiently large n. 

Proof: See Appendix [D] ■ 
Combining the results of Lemmas [s] and |4] and replacing the stopping time r in ( 38 ) with its upper bound 
from Lemma [4] provides that 

(Mo-Mi)^ ^ (1-V^)2 



2 Inn SI 



a 



K 



is a necessary condition for ensuring asymptotically error-free detection. This clearly imposes a more stringent 



condition on the distance (/iq — fJ^i) than (36 1, which is a sufficient condition for retaining at least a fraction 
(1 — 6) of the rare events for throughout the refinement cycles. As a result, irrespective of the optimal value 
of the stopping time r, ensuring asymptotically error-free detection requires the refinement process to retain 
at least (1 — S) fraction of the rare events. 

Finally, we would like to remark that while the exact characterization of the error probability Pri(r, ^(t)), 



does depend on T„ (the number of events to be returned by the detector) , the scaling rate given in ( 38 ) does 
not depend on T„. The intuitive reason is that in analyzing such scaling laws, in the asymptote of large 
n, only the dominant (leading) terms survive and the impacts of the non-leading terms vanish as n grows. 
Tn appears to impact only the non-leading terms and its impact is not observed on the ultimate scaling 
laws. In a more technical sense, the reason lies at the core of the analysis on order statistics. Note that 
the focus of this paper is on the regime T„ = o(ne„). The analysis reveals that while the error probabilities 
vary for different choices of r„, the scaling laws of the mean values is identical for all values of T„ in the 
regime T„ = o(ne„). The underlying reason for this observation is that according to the definition of the 
detection action, the ultimate T„ sequences selected are the sequences with the smallest likelihood ratios. 
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and hence, constitute the T„ smaUest order statistics of the set comprised of the hkehhood ratios of the 
retained sequences. When T„ = o(7ie„) ah these order statistics are of low order (Definition [ij and have 
identical asymptotic impacts on the scaling rates. 

4.2 Gaussian Variance 

In this section we analyze the performance of detection and refinement in the Gaussian variance hypothesis 
testing problem. The presentation of the results follows the same flow as the problem of testing the mean. 
The proofs, however, are different due to different statistical behavior of the likelihood ratio and its pertinent 
sufficient statistic. The problem of interest can be posed as 

Ho: X]^M{Q,Ao), J = l,2,... 

(40) 

Hi: X]^AA(0,Ai), J = 1,2,... 
where > Ai and Aq, G M+. Hence, the likelihood ratio at time t for the sequences i e £t is 



(41) 



Similarly to ( 27 ) we define 

t 

e {1, . . . , r} and Vz e : Zl ^ Y^^X'^f . (42) 



Therefore, the indices of the sequences yielded by the detection and refinement actions defined in (111 and 



(13), respectively, are given by 

U — the indices of the r„ smallest elements of the set {Z^ : i e £,-} , and (43) 
Ct+i = the indices of the a smallest elements of the set {Zl : i e Ct} , (44) 



where a is defined in (14 1. By recalling the distribution of XI given in (401 we have 



Vt e {1, . . . , r} and Vz G : Z'i \ H„, ^ A,,, ■ xHt) , (45) 

where x^{k) denotes a chi-squared distribution with k degrees of freedom. Next, for each t E {1, . . . ,t} let 
us define 

Vj e {1, . . . , fit} : V* = the smallest element of {Zl : i G £?} , (46) 
and Vj e {1, . . . , nt} : V* = the smallest element of {Zl : i G C}} . (47) 



As shown in (33 1 the probability P(t, ■0(r)) is given by 

p„(t,^(t)) = P( |wn4 I > o) = p (^vf^ > v^y (48) 
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As assessing this probability requires knowing the distributions of the first and the T,*'* order statistics of the 
sets of random variables {Z^. : i £ C'}} and {Z^. : i E £}}, respectively, in the following lemma we provide 
the minimal domain of attraction of chi-squared distributions. 

Lemma 5 (x^ Minimum) The chi-squared distribution with k degrees of freedom belongs to the minimal 
domain of attraction of 

L{w) = 1 - exp [-w^l"^^ , Vui e M , 



which is corresponding to (22 1 for k — > 0. The associated affine transformation is characterized by the 



sequences {a™} and given by 

a™ = , and ^ - 



m 

r(fe/2 + i) 



2/fc 

Vm€ {2,3,...} . 



Proof: See Appendix [Ej ■ 
Also, as will be shown later, for analyzing the performance of the detection action we also need to determine 
to what maximal domain of attraction a distribution belongs. 

Lemma 6 (x^ Maximum) The chi-squared distribution with k degrees of freedom belongs to the maximal 
domain of attraction of 

H{w) = exp {^ Y-(Ji/2) ' ^^P^^"^)) ' Vw € M , 



which is corresponding to ([23j) for k — > 0. The associated affine transformation is characterized by the 
sequences {am} and {bm} given by 

Cm = ~ ^Inm H — Inlnm^ , and dm — ^ j ^™ ^ {^i 3, . . . } . 

Proof: See Appendix |Fj ■ 
In the next lemma we provide the performance of the refinement action in retaining the sequences generated 
by Fq and Fi. 

Lemma 7 (Refinement Performance) Let fit = | and ut — \Cl\ denote the number of sequences 
generated by Fq and Fi, respectively, that are retained up to time t. For sufficiently large n, the event 

Ur = ni (49) 

holds almost surely if 

4^ = w (e„lnn) , (50) 
^1 

where e„ is defined as 

A lnne„ 

£n = -j • (51) 

inn 
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Proof: See Appendix [G| 



Therefore, when the scahng law m (50) is satisfied, the proportion of the sequences generated by to 
those generated by Fi increases after the refinement actions. In the next lemma we provide a necessary and 
sufficient condition on the scaling of Ai/Aq that ensures perfect identification of the T„ sequences generated 
by Fi (rare events). 

Lemma 8 (Detection Performance) For a given stopping time r and switching sequence ^^{t), and con- 
ditionally on Ur and fir, the detection error probability P„(r, ■(/'(r)) tends to zero in the asymptote of large n 
if and only if 

> . (52) 



where we have defined 



In^ 

= ^ . (53) 
Inn 



Proof: See Appendix |Hj ■ 
By comparing the scaling laws offered by Lemmas [7] and [8] we find that the scaling law necessary for making 
a reliable detection, irrespective of r and ipir), dominates the one that is sufficient for maintaining n-r — ni 
almost surely. In other words, in order to perform reliable detection the refinement action (A2) retains all 
the rare events almost surely. 



4.3 Optimal Stopping Time 

Given the performance of the refinement action offered by Lemmas [2] and [7j and the detection action given 
by Lemmas [3] and [Sj in this section we provide the optimal choices of the stopping time and the switching 
sequence. Given the discussions at the end of Sections |4.1| and |4.2[ irrespective of the discrepancies in the 
analysis and the ensuring scaling laws in the mean and variance settings, these two settings conform in the 
fact that targeting at error-free detection forces the refinement process to retain almost all of the rare events. 
More specifically, the refinement process is guaranteed (probabilistically) to discard at most a fraction 6 of 
the rare events in testing the mean and retain almost all such events in testing the variance. 

Primarily based on the similar behavior of the refinement process in both settings, the optimal choices 
of the stopping time and the switching sequence turn out to be exactly the same in both settings. The 
scheme of the proofs in this section are as follows. By using the result of Lemma |4] which states that the 
stopping time r is upper bounded by a constant, it is concluded that regardless of the choice of the stopping 
time and the switching sequence, the conditions that Lemmas [s] and [s] impose on (/io — Mi) and Ai/Aq, 
respectively, dominate those imposed by Lemmas [2] and [7j respectively. This immediately indicates that the 
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refinement process in both settings retains almost all rare events almost surely. Based on this property of 
the refinement process, we obtain the choices of the stopping time and switching sequence that minimize the 
error probability. 



Theorem 4 (Stopping Time) For achieving Vn{S,K) 



^ the optimal switching sequence satisfies 



Vte {!,..., if*}: VXl)^!, 



and yt > K* 



m = 



(54) 



where 



Also the optimal stopping time is 



K* 



*/ a < 1 - ^ 
ij a> I - ^ 



K 
S , 



s{K) , if a<l~l 
if a > 1 - ^ 



whe 



s{K) 



S ■ a 



-K 



-K 



1 — a 



(55) 



(56) 



(57) 



Proof: See Appendix |Tj ■ 
This theorem demonstrates that when a is large (close to 1) the optimal sampling does not involve any 
refinement action and adaptive sampling does not offer any gain over the non-adaptive sampling procedure. 
However, for sufficiently large S, only for limited choices of a does adaptation in sampling offer no gain and 
for a wide range of a the optimal sampling procedure involves refinement actions and becomes adaptive. 
Throughout the remainder of this paper we focus on the regime a < 1 — ^. 



5 Adaptation Gains 

The main component of the proposed sampling procedure is the inclusion of the refinement actions, which 
makes the detection procedure adaptive to the data. In this section we show the gains attained by the 
inclusion of the refinement process. These gains can be viewed as the gain of adapting the detection process 
to the observed data and can be interpreted in two ways, namely in the forms of agility and scaling gains 
defined in the sequel. These gains essentially evaluate the contribution of the refinement actions by comparing 
the performance of the proposed sampling procedure against the same procedure without any refinement 
action, i.e., K = 0. 
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Definition 6 (Agility Gain) The agility gain, denoted by G^, is the ratio of the minimum sampling budget 
required by the non-adaptive procedure i.e., K — 0, to that required by the adaptive procedure with K > Q 
refinement actions such that both achieve asymptotically error-free detection while enjoying identical scaling 
for (/io — /ii) in the mean setting or identical scaling for Aq/Ai in the variance setting. 

In order to quantify this gain in tlic Gaussian mean setting we consider a non-adaptive detection procedure 
witli the aggregate sampHng budget controlled by 5*0 and obtain the required scaling law for (/io — fJ-i) that 
guarantees asymptotically error-free detection. For the same scaling law in an adaptive procedure with K 
refinement actions we assess the minimum sampling budget S that ensures asymptotically error-free detection 
by the adaptive procedure as well. Then we find the agility gain as the ratio In order to analyze the 
agility gain for the variance setting we repeat the same steps by replacing all the arguments on the scaling 
of {fiQ — Hi) by those on the scaling of Aq/Ai. 

Definition 7 (Scaling Gain) The scaling gain, denoted by Gs, is the ratio of the smallest scaling law 
required by the non-adaptive procedure to that required by the adaptive procedure when both target at achieving 
asymptotically error-free detection while enjoying identical sampling budgets. 

In order to analyze the scaling gain of the adaptive procedure with K refinement actions in the Gaussian 
mean setting we assume that both adaptive and non-adaptive procedures are allocated the aggregate sampling 
budget S ■ n and aim at identifying the smallest values of and rJJ^ such that the scaling laws (/ig — /ii) = 
•v/2 Inn and (/io ^ /^i) = ^^2 Inn guarantee asymptotically error-free detection for the adaptive and 
the non-adaptive procedures, respectively. The scaling gain can be found as For computing the scaling 
gain in the variance setting we follow the same logic and instead of finding r„-^ and rjj^ we consider the scaling 
laws Aq/Ai = n^'' and Aq/Ai = n^"-' for the adaptive and non-adaptive procedures and find and 



Analyzing the scaling and agility gains strongly relies on the connection between the available sampling 
budget S and the scaling laws that lead to reliable detection of the sequences of interest. The following two 
theorems establish this connection. 



respectively. In this case the scaling gain is defined as 




Theorem 5 (Mean Scaling) When a < 1 — 4 a necessary and sufficient condition for VniS, K) 



n- 



■oo 



is that 



> 




(58) 



m 



K + s{K) 



where r,„ is defined in (39 I and s{K) was defined as 



I — a 



-K 



s{K) ^ S-a-^ + 



1 



(59) 



a 
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where a controls what fraction of the events are discarded at each refinement cycle, S is the aggregate sampling 
budget normalized by n, and K is the maximum allowable number of refinement actions. 

Proof: The proof can be established by combining the results from Lemma [3] and the optimal stopping 
time given in Theorem [4j As shown in the proof of Lemma [3] a necessary and sufficient condition for having 



asymptotically error-free detection is that Bn = t^(l), where i?„ is defined in (78 1. Moreover, we have also 
proved that a necessary and sufficient condition for i?„ = w(l) is that 



rm > 

T 

On the other hand, in the proof of Theorem [4] we showed that the detection error probability is minimized 
when T is maximized. In other words, the smallest necessary scaling law for (/io — /^i)^ is obtained when r 
is maximized and is equal to K + s{K). By substituting this value into the equation above we obtain the 
desired result. ■ 
For a given sampling budget S ■ n and the number of refinement actions K this corollary delineates the 
asymptotic performance of the proposed detector in the {r^, e„) plane. It shows that when (/io — fix) scales 
as \/2 r Inn, if r > r,„ the proposed sequential detection procedure is guaranteed to make error-free decisions 
for identifying T„ sequences generated by Fi . On the other hand, when r < r„i the probability of erroneous 
detection is bounded away from zero. Therefore, r^ defines a sharp threshold for identifying the sequences 



of interest under the objective and constraints in (16 1. 

On a related context, note the relevant results provided by pS| and ^33^ when the objective is to identify 
all sequences generated by Fi through observing each sequence only once, i.e., S = I and K — 0. The works 
in [3H| and show that for £„ e (Oi 5) the false-discovery and non-discovery proportion^ tend to zero if 
and only if (/io — /ii) scales as r In n and r > 1 — e„. This is clearly a more stringent condition than the 
requirement r > r„i = (1 — ^/s^) corresponding to the setting S ~ I and K = in our sampling procedure. 
This discrepancy demonstrates the tradeoff between partial versus full recovery of the sequences of interest 
on one hand, and the required scaling law on {po — /ii) on the other hand. 

In the following lemma we focus on the variance setting and by using the result of Lemma |8] and Theo- 
rem |4] we offer a necessary and sufficient condition on the scaling of Aq/Ai in order to guarantee asymptot- 
ically error-free detection. 



■^The false-discovery proportion is the number of falsely discovered sequences that are generated by _Fi relative to the total 
number of sequences, and the non-discovery proportion is the ratio of the number of sequences generated by _Fo that are missed 
to the entire number of sequence. 
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10 1 

(a) Mean scaling (b) Variance scaling 

Figure 1: Detectable vs. non-detectable regions 

Theorem 6 (Variance Scaling) When e„ = o(l), ne„ — and ct < 1 — ^ a necessary and sufficient 

condition for VniS, K) "~^°^> is that 



2(l-£n) 

K + s{K) 



where is defined in (531 and s{K) was defined as 

s{K) 



S ■ a-^' 



1 — a 



-K 



1 — a 



(60) 



(61) 



where a controls what fraction of the events are discarded at each refinement cycle, S is the aggregate sampling 
budget normalized by n, and K is the maximum allowable number of refinement actions. 

Proof: This results follows from the same line of argument as in the proof of Theorem [sj ■ 



Figures 1(a) compares the regions in the (rni,e„) plane over which the adaptive and the non-adaptive 



procedures are guaranteed to make error-free detections. Specifically, the diagonally shaded region is the 
region in which both schemes succeed to detect the r„ sequences of interest. In the vertically dashed region, 
however, only the adaptive procedure succeeds and the non-adaptive procedure makes an erroneous decision 
almost surely, and finally both schemes fail in the horizontally shaded region. It is observed that, depending 
on the choice of K, the detectability region corresponding to the adaptive procedure can be substantially 



larger than the one corresponding to the non-adaptive procedure. Figure 1(b) depicts the counterpart regions 



in the variance setting in the (^v, £«) plane. Given the necessary and sufficient conditions on the scaling law 
for performing error-free detection, in the following corollaries we determine the agility and scaling gains. 
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Corollary 1 (Agility Gain) When e„ = o{l), rte„ — w(l), and a < 1 — ^ the agility gain satisfies 

< So{G-'-a^) < , (62) 

1 — a 1 — a 

and in the asymptote of large K , the upper and lower bounds on the agility gain meet and are equal to 
Soil - a). 

Proof: See Appendix [jj ■ 
This result indicates tliat for a given set of sequences, tlie proposed sequential detection procedure can 
make a detection decision with substantially fewer measurements compared with the non-adaptive sampling 
procedure. Specifically, the sampling budget required by the adaptive procedure reduces exponentially with 
the number of refinement actions K. It is noteworthy that while the number of refinement actions K can be 
made arbitrarily large (but fixed as a function of n), increasing it beyond some point increases the agility 
gain marginally. More specifically, for large both upper and lower bounds on the agility gain tend to 
Sq{1 — a), which is a constant. This observation sheds light on the fundamental limit of the agility gain 
yielded by adaptive sampling. 

In Fig. [2] we provides a numerical comparison between the proposed adaptive sampling scheme and the 
CUSUM test repeated Tn times. With the target error probability Vn{S,K) = 10^^, the plot depicts the 
necessary average number of samples by the CUSUM and the necessary the number of samples by the 
adaptive sampling schemes for different number of refinement actions. We consider n ~ 10** events and 
set the prior probability e„ — n^^^ and aim to identify r„ — \Jn^n rare events. The normal events are 
distributed as A/'(0, Aq and the rare events as A/'(0, A\) with the variance values satisfying AO/Al = n)-!'^^ . It 
is observed that for the setting that distribution F\ occurs very rarely (i.e., smaller values of e„) the adaptive 
sampling strategy is quicker than the repeated CUSUM tests. On the other hand, by increasing the priors, 
the repeated CUSUM test outperforms the adaptive sampling strategy. 

Corollary 2 (Scaling Gain) When a < 1 — ^ the scaling gain is 



a 



S 1 — ay \ S 1 — a 

and in the asymptote of large K the upper and lower bounds on Gg meet and are equal to [1 — s(i-a) )• 

Proof: The proof follows the same line of argument as the proof of Corollary [l] and the characterization of 



in (53) 



This corollary indicates that the scaling gain grows exponentially with the number of refinement actions. 
This result also indicates that the adaptive procedure can detect signals with much smaller means than 
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Figure 2: Normalized sampling budget S versus the prior probability controlled by e„. 

those detectable by the non-adaptive procedure. More specifically, by noting that a^^ is substantially 
larger than 1, the mean scaling requirement in the adaptive scenario becomes substantially less stringent 
than its counterpart in the non-adaptive scenario. As a result, there are scenarios in which non-adaptive 
schemes fail to successfully identify T sequences of interest, while the adaptive scheme succeeds. 



6 Conclusion 

In this paper we have presented an adaptive sampling methodology for quickly searching over finitely many 
events with the objective of identifying multiple events that occur sparsely and are distributed according to 
a given distribution of interest. The main idea of the sampling procedure is to successively and gradually 
adjust the measurement process using information gleaned from the previous measurements. Compared to 
corresponding non-adaptive procedures, dramatic gains in terms of reliability and agility are achieved. 
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A Proof of Lemma [T] 



Let us define Yi, . . . ,Ym as i.i.d. standard Gaussian random variables with cdf G and pdf g and define the 
function h : {x eR : a; > 1} ^ M+ as 



h{x) = 2 In cc — In In a: . 



By defining 



h Y 



(63) 



and setting 

a„i = h{m) , and bm = \/h{m) , 
for the cdf of Wi-m, denoted by Qi:m(-; m), we have 



(64) 



1-Ql:r„(w;m) = l-P(W^l:m<w) 



92 



1 - P Yl:™ < 



1641 



1-G 



1 -G 



- ^Jh{m) 



We next show that 



hm X ■ log 



1-G 



^Thix) 



By using L'Hopital's rule we obtain 



log 



lim - 

>oo 



1-G 



/h{x) 



1 

2 



/i'(x) [u;-/i(a;)] ^^\[h{x)]-^ + 1 



= lim ■ 

a:;— f oo 



1-G J^ -Wh(x) 



/h{x) 



(65) 



(66) 



lim 



1 \2 - ^1 

2x^/2Tr L In a; J 



2 In x—ln In a; 



exp 



(- 21n:!(nln. )^"PH^ 



^ ^2 In X — In In a: 
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Equations (651 and (66) in conjunction with the continuity of In(-) estabhsh that 



hm. Qi:„i{w]m) = 1 - exp ( , (67) 



which is the desired result. 



B Proof of Lemma [2] 

From the definitions of Uf and fit for all t G {1, . . . , r} we have \Ct\ ~ rit + fit. Therefore, for any time 
t e {1, . . . , T — 1} that the refinement action is taken (i.e., "0(0 = 1) the condition 

[(1 - S)nt'] < nt+i 

for some (5 G (0,1) is equivalent to having 

nt+i < \Ct+i\ - 1(1 - S)nt'] . 



By taking into account the distributions of the order statistics t/j and Uj given in (31 ) and (32), respectively, 
the event that the retained events after a refinement action at time t contains [(1 — S)nt] rare events 

is equivalent to 

^r(l-J)"tl ^ ^\Ct+i\-l{l-S)nt]+l ■ (6^) 

In order to analyze this probability in the following two lemmas we show that the order statistics U*^^_gy^ ^ 
and C/|* I rr, rN n , 1 arc of central orders and specify their distributions. 

Lemma 9 The order statistic U^f^^_g^|^ ^ is of central order and is almost surely distributed as Af{Mt,cft) 
for 

Mt^f,,t + V2terr\2-S-1) and aj^ '^.^ll^^^zl) (69) 

nt exp(2erf (2d — 1)) 

Proof: Note that in the asymptote of large we almost surely have rit — w(l) and Definition|3]confirms that 
^[{i~s)n ] order statistic of central order from a sequence of fit i.i.d. random variables with the parent 

distribution J\f {^it, t). By using Theorem jSjin the asymptote of large n we have ^ ^ ■N'{Mt, Ut) for 

Mt and cr? defined in (|69lJ 



Lemma 10 The order statistic J/,* , r,-, -s\ t , , is of central order and is almost surely distributed as 
MiMt,af)for 

Mt^l^it + V2tcrr\2a-l), and a? = _ "i.^T , (70) 

nt exp(2erf (2a — Ijj 



^For computing Mt we have used the property that when G denotes the cdf of the distribution Af{fi, cr'^), we have G ^(o) 
+ aV2 erf (2a - 1) for a e (0, 1). 
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where erf(-) denotes the Gauss error function. 

Proof: The proofs follows the same line of arguments as in the proof of Lemma [9] and noting that in the 
asymptote of large n 

\Lt+i I - [(1 - + 1 Va{nt + - r„)J - [(1 - ~5)nt-\ + 1 



-> a 



nt 



nt 



holds almost surely. 
Next, by defining 



i.2 ^ ^2 , -2 



(71) 



and by using (70)-(69) we obtain after a refinement action at time t (i.e., ipit) — 1) 

exp(-tV2) 



at > 1 • exp < — • 

27r <Tt ^ 2 V o-t 



where we have used exp(— dt <^ •exp(— a:^/2) for a; > 0. Hence, for the number of rare events 

discarded by the refinement actions, collectively, we have 

P(n^ > ni(l-J)^ I m) > P > ni J| (1 - 5) | ni 

\ t:V.(t) = l 



> [| P(nt+i > nt(\-l) I nt) 

t:^(t) = l 



^ n 

t:^(t) = l 



1 ; exp<^ -- • 

at 2 V CTt 



> 



^ Mt-Mt 1 (Mt-Mt 
1 ; '5xp<^ • ; 

(Tl 2 V (Tl 



K 



(72) 



where the first inequality holds since 'Ylit-ii>{t)=i — ^ ^^'^ ^^^^ inequality holds as at is increasing in t. By 
setting 5 = 1 — (1 — S)^^^ , Eq. (72) indicates that a sufficient condition that ensures Ur > {I — S)ni almost 
surely is that 



Mt^Mt^ 
^1 



i> 1 



(73) 



Now, from (71) recall that af has two summands, where one scales with 1/ni and the other one scales with 



1/ni. By noting that the stopping time can be bounded by a constant (Lemma El) it can be readily verified 



that (73) holds if 



/^o - Ml = ^ 



1 1 

— H 

ni ui 

which in turn holds almost surely if /iQ — /ii = lj (ri^^"/^). 
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C Proof of Lemma [3] 



By using (33) we have the connection between the error probabihty P„(r, ?/;(t)) for given r and ^^{t) and 



low-order statistics of two sequences of sizes 71^ and n^- with Gaussian parent distributions with different 



means. According to (33) we have 



In order to find the asymptotic distribution and the associated minimal domain of attraction of we use 
the result of Lemma [l] For this purpose let us define /i:{a;GM : a;>I}— > M+ as 

h{x) = 2 In X — In In a; . 



Based on the definition and distribution of for i £ Ct given in (27)-(30) let us define 



-AA(0,1) 



and VieCl: W, = + ^^Hr^ ^" ' ^ 



-AA(0,1) 



Given the definitions in (74)- (75 1 we obtain 



P„(t,i^(t)) = P[/£>C/i^ 



(74) 



(75) 



> ^ VT (mo - Ml) 



(76) 



(77) 



where we have defined 

A - 



h{nr) 
h{nT) 



and Bn = \/h{nr) \/h{nr) - \/h{nr) + V^" (mo - A^i) 



(78) 



In order to assess P„(t, -i/'(r)) given in (77) we find the distributions of WV„:„^ and Wi-n^. For this purpose, 
according to Lemma[l]thc cdfs of Wim^ and Wi-m^, denoted by Qi:n^{']nr) and Qi:n^ (•; "-r), respectively, 
satisfy 



lim Qi:n^{w',nr) = 1 — exp(— exp(u' — In 2-y7r)) , 
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and lim Qim^ (w; n-r) = 1 — exp(— exp(?i; — In 2-\/7r)) 



Furthermore, given the asymptotic distributions of the first order statistics by using Theorem [2] we can 

find the distributions of the r*''-other (low-order) statistics. By denoting the cdfs of Wr-.n^ and Wr-.n^ by 
Qr:n^{w]nr) and Qr:n^{w]nr), respectively, we have 

- , , v-^ cxTiiiw — exp(w)) , , ^ , , 

WUV Qr-.n^W + ^-flr) = 1 - > — ., " , Vu) G M , (79) 

and lim Q^-.n^ {w + ^-n^) = 1 - V ^^P(^^ "exp(u;)) ^ ^ (gg) 

where 

<; = ln2v/^ . 
The associated pdfs, consequently, are given by 



r-l 



/ N v-^ (exp(w) — i) exp(iti; — exp(w)) ^ 
lim g,:^^(u; + ^;n,) = ^ v^Jl ^ Vu; G M , (81) 



riT-— foo ^ — ^ I' 

4=0 



r-l 



, v / , N >r;^ (exp(w) -i)exp(iu;-exp(w)) w ^ ^so^ 
and lim (Zrm^ (w + ^^t) = / n ' Vit; G IK . (82) 



i=0 



By using the distributions given in (79 1- (82 1 for given r and find lower and upper bounds on 

P„(r, ?/'(r)). These bounds, in turn, serve as the basis for finding necessary and sufhcient conditions on the 
scaling laws of (/ii — ^.i) that guarantee P„(r, '4>{t)) "~^°°> 0. 



Lower bound: By replacing WT„:n^ by W^n^, based on (77), P„(t, ?/'(t)) can be lower bounded for suffi- 
ciently large n as follows: 

Pn(T, V5(t)) > v(wi..n^ > An ■ Wi..n^ + B^j 

/O poo 
Qi:7ir{AnW + Bn) qi-n^(w) dw - Qi-n^A^w + B„ qi-n^{w) dw 
oo ""^ " ' Jo " ' 

< B„ < uj+B„ 

> 1- / Ql;n^{Bn) qi-n^{w) dw - / Q {w + Bn) Ql-fi^ (w) dw (83) 



— OO 



1 - Ql;n. (Bn) (0) + / (1 " Ql;„. (u' + B„)) Ql-n^ (w) dw - {1 - Qi,n^ (0)) 





= exp(-(exp(i?„-,)))(l-exp(-exp(-0)+'''^^^|'^^) , (84) 

[ exp(S„) + 1 J 

This strictly positive lower bound on P„(r, ?/;(t)) approaches zero if and only if Bn = i^(l)- 
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Upper bound: From (33) for sufRciently large n we have 



P„(T,^(r)) = V{W°>A„-Wl^+B,, 



poo 

) Jo 



< 1 



= [1-QT„;„.(S„)][1-Q};fi^(0)] + /" exp(w - exp(w + B„) - exp(w;)) du. , 
which can be easily shown to approach zero if and only if i?„ = a;(l). 

Therefore, a necessary and sufficient condition for having both the lower and upper bounds P„(t, ■0(t)) 
approach zero is that _B„ — '^(1): which subsequently becomes a necessary and sufficient condition for 



P„(r, ?/'(t)) > 0. According to (781, Bn = is equivalent to 

B„ 



h{nT)V'2, Inn \/i(nT-)V2 In 



f 



{78} 



\/2 In n ^2 In i 



f 



/i(n^)V2hi 



f 



where /i(-) is defined in (34 1 and it holds in the asymptote of large n almost surely if and only if 

y/r ■ .Jr^ - {I - ./e^i) = uj 

or equivalently 



n 



D Proof of Lemma [H 

From the definition of ipi^) given in ^ we have 

Vte{l,...,r-1}: lA+il = |£t|-l^(t)=o + (La(IA|-r„)J +T„). 1^,(4)^1 

> |/:t|-l^(t)=o + (a|A|"l)-l^(t)=i , (85) 

where is the indicator function. Following this relationship, the number of sequences retained and fed 
to the detector is related to the initial number of sequences n according to 



WG{l,...,r-l}: lAI > f* a^-|A| - 

^-^ 1 — a 



a^' • n - 



I -a' 
1 -a 



(86) 
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The constraint ^ X]t=i \^t \ — ^ conjunction with (86) provides 



S-n > 



> T- -n 



K 



1-a 
I — a 



(87) 



which shows that in the asymptote of large n we have r < S/a 



K 



E Proof of Lemma [5] 

Let us define Yi,...,y,„ as i.i.d. random variables distributed according to x^{k). For the cdf of x^{k) 
random variables we have 



Fviv) - P(r. < v) = 



1 



t'=/2-iexp(-t/2) dt . 



2fc/2r(fc/2) Jo 

As for t S [0, y] we have exp(— ?//2) < exp(— i/2) < 1 we obtain the following bounds on FY{y): 

which subsequently provides 

1 



2'=/2r(A:/2 + 1) 



■ exp 



(-y/2)//2 < Fy{v) < 



2'=/2r(/fc/2) 



.//2 



(89) 



(90) 



Next, by setting 



and defining 



, and b„ 



1 2/fc 



r(fc/2 + 1) 



= am + broY, 



Vm€ {2,3,...} , 



for the cdf of Wi, denoted by Fw{w)^ we have Fw{w) = V{Wi < w) = FyiWi < w/bm)- By invoking the 

ing bounds 

r(fc/2 + i)" 



inequalities in (90) we find the following bounds on Fw{w). 



exp —w ■ 



< Fw{w) < 



,,fc/2 



(91) 



which implies that in the asymptote of large m we have Fw{w) = (1 + o(l)). Therefore, for the cdf of 
Wi;m (the first order statistic of {Wi, . . . , Wm}, denoted by Qi-.mi-'j'ni), in the asymptote of large m we have 



Ql:,n{-;m) = l-[l-Fw{w)Y 

yjk/2 



= 1 - 



1 - 



(1 + «(!)) 



1 - exp (^-w''^^^ 
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F Proof of Lemma [6] 

Let us define Yi, . . . ,Ym as i.i.d. random variables distributed according to x^(^) with cdf Fyiy) and define 
the function g : {a; G M : x > 1} — > M+ as 



By defining 



u{x) 



In X H — In In x 



(92) 



and setting 



u{m) , and 



for the cdf of Wm-.m, denoted by Qm■.■m{^'^^)^ we have 



(93) 



i:„i(w;m) = 

l|92l 



P < W) 

w-c„ 

It; — c^, 



We next show that 

lim X ■ log \Fy {'2.{w — u{x))) 

x^oo 

By using L'Hopital's rule we obtain 

\og[FY {2{w - u{x)))] 
lim 

>-oo — 



exp(— Kj) 

' r(fc/2) 



(94) 



(95) 



lim 



-2u'{x) fY{2{w-u{x))) 



= lim 



-i • (1 + ^) • 2'=/2^i(«; + ln.T + Y lnlnx)^/2-i • i • {^f^-' ■ exp(-zz; 



x^oo 2fc/2r(fc/2) 



1 / It; + In a; + In In x 
exp(— w) lim 



r(A:/2) 



In: 



r(fc/2) 



exp(— w) 
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Equations ( 94 1 and ( 95 1 demonstrate that 



lim Qm:m{w;m) = exp 



r(fc/2) 



■ exp(— It;) 



(96) 



which is the desired result. 



G Proof of Lemma [7] 

From the definitions of of rit and fit for aU t e {1, . . . , r} we have \Ct\ — nt -\- fit. Therefore, for any time 
f e {1, . . . , T — 1} that the refinement action is taken (i.e., ip{t) — 1) the condition 



72t = nt+i 



is equivalent to having 



nt+i 



\Ct+i\ - nt 



By taking into account the distributions of the order statistics and given in (461 and (47), respectively, 
the event that the |>Ct+i| retained events after a refinement action at time t contains rif rare events is 
equivalent to 

(97) 



In order to analyze this probability in the following lemma we show that the order statistic V\Ct+i\-nt+i 
of central order and specify their distributions. 



IS 



Lemma 11 The order statistic V,*r i i t is of central order and 



where 



Ar^Af[n,a^) , for ^ = G ^(a), and cr^ - "''^ 



nt\-g{G-^{a)W 



(98) 



(99) 



and G and g denote the cdf and pdf of X^(i). 

Proof: The proof of this follows from the same line of argument as in the proof of Lemma [9j 



Therefore from (98)-(99) we obtain when ilj{t) = 1 
P (nt+i = nt I nt) = 



A 

In nt < — — ■ a ~\mit ) f a (a) da , 

2Ai 2Ai 1 1 JAy J 



(100) 



where denotes the pdf of A. Based on the definition of V* given in (46), is the highest order statistic 



of a sequence of nt i.i.d. random variables with the parent distribution x^it)- Hence, by using Lemma [6] we 



33 



obtain 



^-lnn,<u,)=exp(^.cxp(-u;) 



Therefore, from (100) and (101 1 we get 



'(rit+i = nt) 



exp [ • exp (irnit — ^^-^ ' ) ) Ia^'^) ^'^ 



r(i/2) 



2Ai 



(101) 



(102) 



^ i/2 '''p(r(t72) ^^-^^"^"^^ 



2^41 2 



(103) 



exp I T^TTTT^ • exp (^Innt " ^ ' 



r(i/2) 



exp I T^TTJ^ ■ exp flnnt - • ^ ) ) P(|^ - p\ < fl/2) 



T{t/2) 



2Ai 2 



1 - 



(104) 



where ( 102 ) holds by narrowing the interval of integral, ( 103 1 holds as the integrand in ( 102 1 takes its smallest 



value at a = /i/2, and (104) holds according to Chebyshev's inequality. Next, note that in the asymptote of 
large n almost surely we have ntuj{l), and consequently, "~^°°> 0. Hence, 



2Ai 2 

is a sufficient condition for ensuring P {rit+i — nt 
asymptote of large n almost surely when ^ = a;(lnne„) or equivalently ^ = a;(e„ Inn). 



(105) 



1. In turn, this condition is satisfied in the 



H Proof of Lemma [8] 



According to (48), the detection error probability is 

P„(r,V;(T)) -P(y^„ > V{ 

In order to find the asymptotic distribution and the associated minimal domains of attraction of and 
Vi we use the result of Lemma [sj For this purpose, let us define 

1 



h{x) 



(106) 
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Also, let us set 



and Vi e : ^ Z\ 



h{nr) 
h{n-r) 



(107) 
(108) 



Therefore, according to Lemma [sjthe cdfs of Wi;n^ and Wi;n^, denoted by (5i;n^ (w; n^.) and Qi:n^{w;nr), 
respectively, satisfy 

lim Qi:n^{w;nT-) = 1 — expf— w^^^ 
and lim Qi:nr{w;nr) = 1— cxp(— w^^^ 



Furthermore, by using Theorem |2] we can find the distribution of the r,*''-other (low-order) statistic Wr^-.n^ 
as follows: 

T^-^ ,,ir/2 



lim QT„:n^{w) = 1 - exp ( -u;'^/^ ) ^ ., Vw G 



(109) 



i=0 



Given the definitions in (107l-(108) we obtain 

P„(r,^(T)) = p(y^^>Fi-) 



A) h{nr) 

Ai h[flr) 



1-P Wt„:„, < W^l:n, 



By setting 



e = 



Ao h{nr) 
Ai h{nr) 



(110) 



we get 



P„(t,'0(t)) = 1~ qi:,-i^iw;nr) qT^,n^{x;nr) dx dw 
Jo Jo 



= 1 



oo /A 



^ g-i;s,(^;;n.) |^1 - exp (-w^'^ Q^'^ ) ^ 



^1 0ir/2~ 



r/2 to^ 



/2-1 „-™^/2 



T„-l 



1-E 



72 <d"/^ f°° 



i=0 



exp 



(-ii;^/2(l + e^/2)) «;W2+r/2-l 
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By further setting 



we find 



(i + e^ 



/2 



T„-l 



r/2 (1 + 9^/2)'+^ To 



i=0 



J- / exp(— s) ds 



= r(i+i) = i! 



,1+1 



i-<i- 



^ (i + e-/2)' 

^"-1 / QT/2 



1 / e^/^ 
1 + ^ 1 1 + 6^ 



r/2 



er/2 



(111) 



Vi + e^/^ 

Hence, the requirement P„(t, '0(r)) "~^°°> is equivalent to 0'^/^ — tj(l), which by taking into account 



(1061 and (110 1, is in turn equivalent to 



(112) 



By taking into account that ^ "^°°> c-n^ for some constant c G M+, this condition can be equivalently 
cast as 

. ^ 2(1-^ ^ (113) 



Ini 



I Proof of Theorem [4] 

From the definition of Vn{S,K), the optimal stopping time r and the optimal switching sequence iP{t) are 
the minimizcrs of P„(t, ■(/'(r)) within the constraints on the sampling budget and the number of switchings. 
In the sequel we first prove that the impacts of the stopping time and the switching sequence are embedded 
in the term t and minimizing P„(t, tpiT)) reduces to maximizing r. We provide separate proofs for the mean 
and variance cases. 

Mean: 



As shown in ( 33 1 and ( 76 1 we have 



P«(T,Vi(r)) 



+ V'^{fJ-o- Ml) 



(114) 
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where U^^ and C/f denote the order statistics of the sets {Zl : t e C\} and {Zl : t e L^}. Furthermore, as 



shown in ( 74 ) and ( 75 1 we also have 



and yi £ Z\ 



Z\-lll-T 



AA(0,1) 
AA(0,1) 



Hence, the distributions of '^^"^^ ^ and are independent of t and the effect of r on P„(t, 7/'(t)) 

is captured entirely by the term y'T(/io — Mi)- Therefore, the error probability is minimizes when r is 
maximized. 
Variance: 



By recalling ( 106 1, ( 110 ), and ( 111) we have 



P„(t,i^(t)) = 1- 



1 + 



where from (110) and (106) we have 



Qr/2 



Ai J fir 

Therefore, minimizing P„(r, ^(r)) is equivalent to maximizing 0'^/^, which happens when t is maximized. 
Next, we obtain the choices of r and i^i^r) that maximize r. Let us denote the optimal number of refinement 
actions by K* < K. We argue that the optimal switching sequence that maximizes t is of the form 

V;(t) = {1_.^, 0,...,0} . (115) 

To prove this we first show that the aggregate number of observations taken corresponding to the switching 
sequence 

^(r) ^ {1^1 , , 1 , ^2} (116) 
is strictly smaller than that corresponding to the switching sequence 

^(r) ^ {V'l , 1 , , ^2} , (117) 

where ipi and %l)2 are switching sub-sequences. Note that the sequences ipiT) and iP{t) differ only in two 
switch values. By denoting the lengths of ■(/'i and ■02 by jV'il and |V'2|j respectively, the number of samples 
taken according to the switching sequence ^{t) is 

T IV^ll + l \i>l\ + \^2\+2 

^ t=|V>i|-|-4 



times t=l,...,|V>i 1 + 1 



time t=|-0i|+2 



time t=|i/'i|+3 



37 



and the number of samples taken according to the switching sequence i^ir) is 



t=i 



\Ct 



\i>i\+i \iPi\+\iP2\+2 

\Ct\ + L«(|/:^^|+J-T„)j+T„ + La(l%i+il-rn)J+rn+ E 1^*1- 
. , : — ' : — ' t=iv.ii+4 

^ ; time t=|i/ji 1+2 time t=|i/;i 1+3 

times t=l,...,|i/)i|+l 

(119) 



By comparing (118) and (1191 aU the summands of J2t=i I'^tl ^""^ identical except the second terms, where 



it is is strictly smaller in (118). By following the same line of argument, it is concluded that among all 



switching sequences that contain K* switches with value 1, the sequence that has its initial K* switches 
set to 1, takes the least number of samples. Therefore, among all switching sequences with K* refinement 



actions, the sequence of the form (1151 leaves the most unused sampling resources, which in turn can be 



exploited to take further samples and delay (increase) the stopping time. 



Next, we need to determine the optimal number of refinement actions K* . By using (14), the following 
equation delineates the number of sequences of sequences observed at time t. 



(120) 



Hence, by invoking that the switching sequence is of form (115), we obtain 

T K' 

^lAI = ^(La*-i(n-T„)J+T„) + (r-X*)(La^*(n-T„)J+T„) . (121) 
t=i t=i 

Furthermore, by taking into account the hard constraint on the sampling resources, i.e., X]t=i \^t\ ^ S ■ n, 
in the asymptote of large n we have 



- ( [a'-Hn - T„) + T„J ) 



K* = 



5- 



1 — a' 



La^*(n-r„)J +T„ 
Also, by noting that a E (0, 1) it can be readily shown that for x G {0,N} 

1 - a'"^ 



1 -a 



(122) 



1 - a 



is increasing in x when a < 1 — ^ and decreasing in x when a > 1 — ^ . As a result 

K , if a < 1 - i 



K* = 



, if a > 1 - 4 



By using ( 122 ) the optimal stopping time is 



K 

S , 



l-Q 



, if a < 1 - i 
if a > 1 - I 
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J Proof of Corollary [l] 



By following the same line of argument as in the proof of Theorem |4] it can be readily shown that the impact 
of the average sampling budget on the asymptotic error probability Vn{S,K) is captured by r. Therefore, 
achieving identical asymptotic performance Vn{S, K) = Vn{So, 0) is equivalent to equating the term r under 
the adaptive and the non-adaptive procedures. By recalling the results of Theorem |4] and (54)-(56) this 
latter equivalence can be stated as 



-K 



— Sq 



which in turn provides 



-K 



1 - a 



< Sn < S ■ a 



-K 



'K 



1 - a 



(123) 



(124) 



After some simple manipulations we obtain the desired result. 
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